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COMPARISON  OF  EXPERIMENTS 

DAVID  BLACKWELL 

HOWARD  UNIVERSITY 


1.  Summary 

Bohnenblust,  Shapley,  and  Sherman  [2]  have  introduced  a  method  of  compar¬ 
ing  two  sampling  procedures  or  experiments;  essentially  their  concept  is  that  one 
experiment  a  is  more  informative  than  a  second  experiment  0,  a  3  / 3,  if,  for  every 
possible  risk  function,  any  risk  attainable  with  /3  is  also  attainable  with  a.  If  a  is 
a  sufficient  statistic  for  a  procedure  equivalent  to  /3,  a  >  /3,  it  is  shown  that 
a  d  /3.  In  the  case  of  dichotomies,  the  converse  is  proved.  Whether  >  and  3  are 
equivalent  in  general  is  not  known.  Various  properties  of  >  and  3  are  obtained, 
such  as  the  following:  if  o  >  /3  and  y  is  independent  of  both,  then  the  combina¬ 
tion  (a,  7)  >  (0,  7).  An  application  to  a  problem  in  2  X  2  tables  is  discussed. 

2.  Definitions 

An  experiment  a  is  a  set  of  X  probability  measures  U\,  .  .  .  ,  uN  on  a  Borel  field 
B  of  subsets  of  a  space  X.  The  N  measures  are  considered  as  N  possible  distribu¬ 
tions  over  X,  and  performing  the  experiment  consists  of  observing  a  sample  point 
x  £  X.  A  decision  problem  is  a  pair  (a,  A),  where  A  is  a  bounded  subset  of  X-space. 
The  points  a  £  A  are  considered  as  the  possible  actions  open  to  the  statistician ; 
the  loss  from  action  a  =  (ai,  .  .  .  ,  aN)  is  if  the  actual  distribution  of  x  is  A 
decision  procedure  f  for  (a,  ^4)  is  a  B-measurable  function  from  X  into  A,  specify¬ 
ing  the  action  a  to  be  taken  as  a  function  of  the  sample  point  x  obtained  by  the  ex¬ 
periment.  With  every  /  =  [<xi(^),  .  .  .  ,  flivO*;)]  is  associated  a  loss  vector 

v  (/)  =  (^J  ax{x)  dui,  .  .  .  ,  f  aN{x)  du ; 

the  i-th  component  of  v(J)  is  the  expected  loss  from  /  if  x  has  distribution  The 
range  of  v(J)  is  a  subset  of  X-space  which  we  denote  by  Ri(a,  A) ;  the  convex  closure 
of  Ri(a,  A)  will  be  denoted  by  R( a,  A)  and  will  be  called  the  set  of  attainable  loss 
vectors  in  (a,  A)\  every  vector  in  R  is  either  attainable  or  approximate  by  a  ran¬ 
domized  mixture  of  X  +  1  decision  procedures. 

Theorem  1.  R(a,  A)  =  R(a,  Ai)  =  Ri(a,  Ti),  where  A\ is  the  convex  closure  of  A. 

This  theorem  permits  us  to  restrict  attention  to  closed  convex  A,  which  we 
shall  do  in  the  following  sections.  The  proof  of  the  theorem  will  not  be  given  here; 
it  is  straightforward  except  for  the  fact  that  R(a,  Ai)  =  Ri(a,  Ai).  This  fact  fol¬ 
lows  from  the  result  that  whenever  A  is  closed,  so  is  Ri(a,  A),  which  has  been 
proved  elsewhere  by  the  author  [1]. 

Following  Bohnenblust,  Shapley  and  Sherman  [2],  we  shall  say  that  a  is  more 
informative  than  jS,  written  a  3  /3,  if  for  every  A  we  have  R(a,  A)  3  R(fi,  A). 
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It  is  an  immediate  consequence  of  theorem  1  that  if  R(a,  A)  z>  R(i 3,  A)  for  every 
closed  convex  A ,  then  a  3  /S. 


3.  Conditions  equivalent  to  a  n  {J 

Theorem  2.  The  following  conditions  are  equivalent  to  a  d  f3. 

(1)  For  every  A  and  every  v  £  R(fi,  A),  there  is  a  v*  £  R( a,  A)  with  v*  ^  Vifor  all  i. 

(2)  For  every  A  and  every  choice  of  a  i?  0,  52  cx  —  1, 

min  Cj  Vi  ^  min  7 


(3)  For  every  A , 


min  »»  ^  min  52  v{ 

v£R(.a>A)  i*  R(0,A)  i" 


(4)  For  every  A , 

min  (max  »,•)  ^  min  (max  v,-) . 

»€s(aiA)  »  r(&>  a)  i 

Proof.  The  implications  an  /3  — >  (1)  — >  (2)  — >  (3),  (1)  — >  (4)  are  immediate. 
We  show  that  (3)  implies  a  n  /3.  Let  dh  .  .  .  ,  dN  be  any  constants,  and  let  T 
be  the  linear  transformation  Tv  =  (divi,  .  .  .  ,  dNvN).  Then  R(a,  TA)  = 

TR(a,  A)  and  min  52  v* =  m^n  52^*'t’*>  and  similarly  for  0.  Thus 

«€  fl(a.  ta)  ^  i>6  s(a.A) 


(3)  yields  that  for  all  A,  di,  .  .  .  ,  dN,  min  52  ^»^»=  min  52  : 

every  supporting  hyperplane  of  i?(a,  yl)  lies  on  one  side  of  2?(/3,  ^4),  so  that 
2?(a,  yl)  n  i?(j3,  yl)-  Finally,  we  show  that  (4)  implies  (2).  For  any  A  and  any 

cf.£0, 52  Cf  =  1  ,  let  Vo  £  R(0,  A)  be  a  point  where  52 CiVi  assumes  its  mini- 

i 

mum  value  over  R({3,  A),  and  let  U  be  the  linear  transformation  Uv  =  v  —  v0. 
Then 


mm 

r(0<Ua) 


52  ci  vi  =  9  =  min  (max  »*•)  . 


i>€  r(P’Ua) 


Applying  (4)  to  £L4  yields  min  (max  v,)  ^0  ,  so  that  min  52  Ci  Vi  —  ®  • 

v£r(.*'UA)  i  v£R(a>UA)i* 


Thus  min  52 

R(a-<  a)  i 


ct  vi  =  52 


CiVoi  so  that  (2)  holds. 


4.  Reduction  to  standard  experiment 

For  any  a,  let  pi(x),  i  =  1,  .  .  .  ,  N,  be  the  density  of  Ui  with  respect  to  Nu0  = 
Mi  -(-...  +  Un,  so  that  for  any  S  £  B,  «»(5)  =  f  Npi(x)duo.  Then  pi  ^  0, 
52  Pi  —  1  except  on  a  set  of  u0  measure  zero,  and  we  may  redefine  here  so  that 
the  conditions  hold  identically.  Let  P  be  the  set  of  A-tuples  p  =  (pi,  .  .  .  ,  Pn), 
Pi ^  0,5>=  1,  and  define,  for  any  Borel  subset  of  A  oiP,niifA)  =  Ui{p(x)  £yl}, 
where  p(x)  =  [pi(x),  .  .  .  ,  Pn(x)\,  so  that  w,  is  the  distribution  of  p  when  x  has 
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distribution  U{.  Since  p(x)  is  a  sufficient  statistic  for  x,  considering  i  as  the  param¬ 
eter,  we  would  expect  that  the  experiment  a*  with  measures  m\,  .  .  .  ,  mN  on  P  is 
equivalent  to  a.  This  fact  was  noted  in  [2]  for  the  case  in  which  the  set  A  of  ac¬ 
tions  has  only  a  finite  number  of  extreme  points,  and  is  embodied  in 

Theorem  3.  For  every  A,  R(a,  A)  =  R( a*,  A). 

Proof.  We  shall  use  the  notation  /  £  (a,  A)  to  indicate  that  /  is  a  decision 
procedure  for  the  experiment  (a,  A).  For  any /*  =  [ai(p),  •  •  •  ,  €  (a*,  ^4), 

define  /  =  {aff)(x)\,  .  .  .  ,  aN[p(x)\\,  so  that  /  6  (a,  A).  Since  p  has  the  same 
distribution  on  P  with  respect  to  nti,  i  =  0,  ,  N,  Nm0  =  m\  +  mu, 

as  p(x)  on  X  with  respect  to  Ui,  for  any  Borel  function  g{p)  we  have  J g(p)dmi  = 

f  g[p(x)\dui.  Choosing  g(p)  =  afp)  yields  v(f*)  =  v(f),  so  that  R(a*,  A)  c 


R(a,  A).  For  the  reverse  inclusion,  let  /  =  [tfi(x),  .  .  .  ,  0^0*9]  €  i?(a,  A),  let 
a*(p)  =  E(di\p),  the  conditional  expectation  of  a*  given  p,  with  u0  as  the  basic 
probability  measure  on  X ,  and  let  /*.=  W&P),  •  •  •  ,  0v(/>)].  Then  for  any 


Borel  function  g{p),  we  have J ai(x)g(p)du0  =  J a*(p)g(p)dm0.  Choosing  g(p)  = 
Pi  and  using  J ai(x)pidu0  =  f  ai(x)dui  and  J a*(p)Pidm0  =  J  a*(p)dmi 


yields  that  v(J )  =  »(/*);  it  remains  to  show  that  /*  £  (a*,  ^4),  that  is,  that  the 
values  of  /*  are  in  A.  If  not,  there  is  a  linear  function  L{a )  with  L(a)  ^  0  for 

a  €  A,  u0{L[f*(x)}  >  0}  >  0.  Then  J L[f(x)]du0  ^  0,  while  J L[f*(p)]du0  >  05 

where  S  =  {L[f*(x)]  >  0},  so  that  the  two  integrals  cannot  be  equal,  contrary 
to  the  definition  of  conditional  expectation.  Thus /  6  (a*,  A),  and  the  proof  is 
complete. 

Thus  every  experiment  a  is  equivalent  in  the  sense  of  theorem  3  to  be  an  experi¬ 
ment  a*  whose  outcome  is  a  point  p  6  P.  The  experiment  a*  is  called  the  standard 
experiment  associated  with  a.  Note  that  the  measures  mi,  .  .  .  ,  mN  of  the  standard 
experiment  a*  are  completely  determined  by  mo  =  (mi  mN)/N,  since 

the  density  of  w,  with  respect  to  Nm0  is  simply  pi,  and  that  the  standard  experi¬ 
ment  associated  with  a*  is  simply  a*.  Moreover,  any  probability  measure  m0  over 


P  such  that  J Npidmo  =  1  for  i  —  1,  .  .  .  ,  X  is  the  m0  of  a  standard  experiment 

a*,  with  mi,  .  .  .  ,  mN  defined  by  mt(S)  =  N  /  pidm0 ;  the  class  of  standard  expen¬ 
ds 

ments  is  essentially  equivalent  to  the  class  of  probability  measures  over  P  with  mean 
(1  /N,  .  .  .  ,  1/A9.  The  m0  of  the  standard  experiment  of  an  experiment  a  will  be 
called  the  standard  measure  of  a ;  for  two  standard  measures  M,  m  of  experiments 
a,  fi,  the  notation  M  3  m  means  that  a  3  0. 

The  following  theorems,  proved  in  [2],  are  valuable  tools  in  the  actual  compari¬ 
son  of  two  experiments. 

Theorem  4.  For  two  standard  measures  M,  m,  M  z>  m  if  and  only  if  for  every 
continuous  convex  gif),  J g(p)  dM  ^  J g(p)dm. 

Proof.  Let  A  be  the  convex  set  determined  by  a  finite  set  a,  =  (an,  .  .  .  ,  aiN ), 

N 

i  =  1,  .  .  .  ,  k,  ancfr  define  Lfp)  =  aaPh  L(P)  —  min  Li(P),  f(P)  =  when 

}=i 
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Lj{p )  >  L(j>),  k  <  i,  Li(p )  =  L(p).  Then  /  £  (a,  A)  for  any  standard  experi- 

N  N 

ment  a,  and  for  any  f*  6  (a,  .4),^  pjdjiP)  ^  ^  Piai(p)  for  all  p,  so  that  with 
v*  =  »(/*),  v  —  v(J) 

s,!"ivs/4'w  6  n  it  f  a>  <*>  p>du 

=  2  =  N  f  Up)  dM , 


;=i 


i'-i 


that  is,  if  a  has  standard  measure  M, 

min  V  vj=N  f  L  (p)  dM . 

Thus  for  a  pair  of  standard  experiments  a,  &  with  standard  measures  M,  m,  con¬ 
dition  (3)  of  theorem  2  holds  for  every  A  determined  by  a  finite  set  if  and  only  if 

J L(p)dM  fS  J L(p)dtn  for  every  L(p)  which  is  the  minimum  of  a  finite  number 

of  linear  functions,  that  is,  if  and  only  if  f c(p)dM  ^  f  c{p)dm  for  every  c(p) 

which  is  the  maximum  of  a  finite  number  of  linear  functions.  It  is  readily  shown 
by  approximation  that  if  condition  (3)  of  theorem  2  holds  for  every  A  determined 

by  a  finite  set,  it  holds  for  all  A,  and  that  ft c(p)dM  ^  J c(p)dm  for  all  c(p) 

which  are  maxima  of  a  finite  number  of  linear  functions  implies  the  same  in¬ 
equality  for  all  convex  c(p),  and  the  theorem  follows. 

rv  rv 

Theorem  5.  If  N  —  2,  M  o  mif  and  only  if  /  FM(x)  dx^  I  Fm(x)  dx  for 

J  0  Jo 

all  y,  where  FM(x)  =  M{pi  ^  rc},  Fm(x )  —  m{pi  S  *}. 

Proof.  Define  cv(x)  =  y  —  x  for  x  ^  y,  cy(x)  =  0,  x  ^  y.  Every  convex  func¬ 
tion  c(x)  on  (0,  1)  can  be  uniformly  approximated  by  a  linear  function  plus  func- 

K 

tions  of  the  form^T^  a-iCVi{x),  where  a,  ^  0,  so  that,  from  theorem  4 ,  M  o  m 

»=i 

if  and  only  if  J cv(x)dM  ^  Jcy{x)dm  for  all  y.  Now  Jcv{x)  dM  =  J  (y  —  x)  dM 

-Jf*"  (x)  dx ,  integrating  by  parts,  and  similarly  for  J cy(x)dm,  so  that  the 
proof  is  complete. 


6.  Sufficiency 

A  standard  experiment  a  with  measure  M  is  said  to  be  sufficient  for  a  standard 
experiment  (3  with  measure  m,  written  a  >  $  Or  M  >  m,  if  there  is  a  function 
Q(p,  E),  defined  for  each  p  £  P  and  each  Borel  set  E  of  P  such  that  (1)  for  fixed 
p,Qvs>a.  probability  measure  over  P,  (2)  for  fixed  E,  Q  is  a  Borel  function  of  p,  and 

(3)  for  every  E,  m,(E )  =  J Q(p,  E)dMi{p),  i  =  1,  .  .  .  ,  N,  where  mi,  .  .  .  ,  MN, 

Mi,  .  .  .  ,  Mn  are  the  measures  over  P  associated  with  m,  M  respectively,  that  is, 
if  there  is  an  experiment  7  over  the  space  Pi  X  Pi  with  measures  m*  such  that 
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the  distributions  of  pi,  pi  with  respect  to  tn*  are  Mi,  trii  and  that  pi  is  a  sufficient 
statistic  for  (pi,  pi)  with  respect  to  m*,  .  .  .  ,  m%.  That  the  second  formulation 
is  equivalent  to  the  first  follows  from  an  unpublished  result  of  Doob  that  condi¬ 
tional  distributions  of  real  or  vector  variables  with  respect  to  real  or  vector  vari¬ 
ables  can  always  be  defined  so  as  to  be  probability  measures;  we  shall  use  this  fact 
several  times  in  what  follows.  Essentially,  M  >  m  means  that,  if  p  is  the  result  of 
experiment  M,  then  a  vector  p'  selected  according  to  the  distribution  Q(p,  E ) 
will  be  as  informative  as  a  p*  resulting  from  experiment  m,  in  the  sense  that  for 
each  i,  p'  and  p*  have  the  same  distribution. 

Theorem  6 .  M  >  m  if  and  only  if  there  is  a  function  D(p,  E)  such  that  (4)  for 
fixed  p,  D  is  a  probability  measure  over  P,  (5)  for  fixed  E,  D  is  a  Borel  function  of 

P,  (6)  J pidD(p*,  p)  =  p*,  and  (7)  for  every  E,  M(E)  —  J D(p,  E)dm(p). 

Proof.  Suppose  M  >  m,  and  let  i,  pi,  pi  be  chance  variables  whose  joint  dis¬ 
tribution  is  specified  as  follows:  i  =  1,  .  .  .  ,  N,  each  with  probability  1  /N;  the 
conditional  distribution  of  pi  given  i  is  and  the  conditional  distribution  of  pi 
given  i,  pi  is  Q(pi,  E),  a  function  of  pi  only.  Then  pi,  pi  have  distributions  M,  m 
respectively,  and  Wj  is  the  conditional  distribution  of  Pi  given  i.  There  is  a  de¬ 
termination  of  D(pi,  E),  the  conditional  probability  given  pi  that  pi  £  E,  such 
that  for  each  pi,  D  is  a  probability  measure  over  P,  and  for  any  g(pi),  E(g\pf)  = 
g(p)dD(Pi,  p).  This  D  then  satisfies  conditions  (4),  (5),  and  (7)  of  the  theorem, 
and  (6)  will  be  proved  if  we  show  that  Pu0  =  E^u^pf)  for  i0  =  1,  .  .  .  ,  N, 
where  pu  is  the  i- th  coordinate  of  pk,  k  =  1,  2. 

We  first  verify  that  the  probability  Pr{i  =  =  pun-  This  is  equivalent  to 

the  statement  that,  for  any  S,  Pr(i  =  i0,  px  £  S)  =  J pwdM,  and  a  similar 

statement  with  M  replaced  by  m  for  k  —  2.  Since  NPio  is  the  density  of  Mi  with 
respect  to  M, 

f  pudM  =  ~ Mi  (S)  =Pr{i  =  i0  }Pr  { p^S \  i  =  i*  } , 

and  similarly  for  k  —  2.  Moreover,  Pr\i  =  io\pi\  =  E{Pr(i  =  io\pi,  pf)\Pi),  so 
that  to  show  that  pao  =  E(pn0\pf),  it  is  sufficient  to  show  that  E{Pr(i  = 
io\pi,  pi)\pi\  ~  E(piu,\Pi),  and  this  will  follow  from  (8)  Pr{i  =  i0\pi,  p2}  = 
Pr{i  =  io\pi\.  We  postpone  the  proof  of  (8). 

Now  suppose  there  is  a  function  D  satisfying  the  conditions  of  the  theorem. 
Let  i,  pi,  Pi  be  chance  variables  whose  joint  distribution  is  specified  as  follows: 
pi  has  distribution  m;  the  conditional  distribution  of  pi  given  pi  is  D(p2,  E); 
and  the  conditional  probability  that  i  =  io  given  pi,  pi  is  pu0,  a  function  of  pi 
only.  Condition  (6)  says  that  E(pu\Pi)  =  pa,  so  that  Pr{i  =  in\pi)  =  E\Pr(i  = 
i0 1  pi,  p2)  |  pi)  =  E(pu j  pf)  =  Pa,  and  condition  (7)  guarantees  that  pi  has  distribu¬ 
tion  M.  We  next  show  that  Pr{pi  £  E\  i}  =  MfE),  Pr{pi  £  E\i]  =  mfE),  that  is, 
that  Pr{i  =  i0,  pi  £  E]  —  Pr{i  =  i0}Mi(E)  and  Pr{i  =  i0,  pi  £  E\  =  Pr{i  = 

io}mi(E).  Since  Pr\i  =  u\Pi)  =  piw,  Pr{i  =  io,  Pi  ££}  =  /* pu,dM  =Mi(E)/N; 
similarly,  Pr{i  =  i0,  Pi  £  E)  =  nti(E)/N,  so  that  we  need  simply  note  that  Pr{i  = 
iQ\  =  f  pwdM  =  1/iV,  since  M  is  a  standard  measure. 
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Let  Q(p,  E)  be  the  conditional  distribution  of  pi  given  p\.  Then  requirements  (1), 
(2)  hold.  Requirement  (3)  may  be  written  Pr{pi  £  E\i)  =  E{Pr(pi  £ 
or  E{Pr(pi  6  E\p\,  i)  |i}  =  E{Pr(pi  £  E\pi)\i]  which  will  follow  from  (9) 
Pr{p2  6  E\pi,  i }  =  Pr{pi  €  E\pi\. 

The  proof  of  the  theorem  is  now  complete  except  for  (8)  and  (9),  which  are 
special  cases  of 

Theorem  7.  If  x,  y,  z  are  chance  variables  such  that  the  distribution  of  z  given  x,  y 
is  a  function  of  y  only,  then  the  distribution  of  x  given  y,  z  is  a  function  of  y  only. 

Proof.  If  h(y,  z)  is  the  characteristic  function  of  a  set  depending  only  on 
y,  z  and  g(x)  is  the  characteristic  function  of  a  set  depending  only  on  x,  we  must 
show  that  E(gh)  =  E[E(g\y)h\.  We  prove  the  equation  when  h(y,  v)  = 
h\{y)hi(z)\  the  general  result  follows  by  approximation.  We  have  E[E(g \  y)hihi]  — 
E{E[ghE{hi\y)]\y\  =  E\ghE{hi\y)]  =  E[ghiE(hi\x,  y)]  =  E(ghihf).  This  com¬ 
pletes  the  proof. 

Theorem  7  asserts  essentially  that  a  Markoff  chain  is  also  a  Markoff  chain  in 
reverse,  a  fact  noted  in  varying  degrees  of  generality  by  several  writers.  The  proof 
given  here  seems  particularly  simple. 

Theorem  6  can  be  restated  as  follows:  M  >  m  if  and  only  if  there  are  chance 
variables  pi,  Pi  with  distributions  M,  m  such  that  E{pi\pf)  =  />2. 

Theorem  8 .  If  M  >  m,  then  M  o  m. 

Proof.  For  every  continuous  convex  g(p),  J g (p)dM  —  J g(p)dD(p',  />)J 


dm(p') ,  where  D  is  the  set  of  measures  whose  existence  is  asserted  by  theorem  6. 
Since  g  is  convex,  J g(p)dD(p',  p)  ^  g]^J pdD{p',  />)]  =  g O'),  so  that 
/ g(p)dM  ^  / g{p)dm  and  Mom. 

Thus  theorems  4  and  6  reduce  theorem  8  to  a  special  case  of  the  fact,  noted 
by  Hodges  and  Lehmann  [4]  and  Doob  (unpublished  manuscript)  that  for  any 
continuous  convex  g  and  any  chance  variables  x,  y,  £[g(^)]  ^  E{g[E(x)  |y]}. 


6.  Equivalence  of  >  and  o  for  N  =  2 

In  this  section  we  consider  only  the  case  N  =  2,  so  that  P  =  { (pi,  pi) } ,  pi  ^  0, 
px  -f-  pi  =  1.  For  simplicity  of  notation,  we  denote  the  point  (pi,  pf)  by  the  num¬ 
ber  x  =  Pi,  0  ^  x  ^  1,  so  that  a  standard  measure  becomes  simply  a  probability 

measure  defined  for  Borel  subsets  of  (0,  1)  such  that  /  xdM  =  | .  For  any 

J  o 

rv 

standard  measure  M,  we  write  FM(y)  =  M{x  ^  y),  cM(y)  =  I  FM(x)  dx. 

J  o 

Then  Cm  is  a  nondecreasing  convex  function  of  y,  cM( 0)  =  0,  cM(  1)  =  h  an(b 
according  to  theorem  5,  M  o  m  if  and  only  if  cM(y)  ^  cm(y)  for  all  y. 

A  class  of  measures  D(x,  E)  such  that  D  is  for  each  x  £  (0,  1)  a  probability 

measure  over  (0,  1),  for  each  E  a  Borel  function  of  x,  and  /  ydD  (x,  y)  =  x  is 

•'O 

called  a  transformation  T,  and  for  any  standard  measure  m,  the  standard  measure 
M (E)  =  f  D(x,  E)dm  will  be  denoted  by  Tm.  Theorem  6,  for  N  =  2,  asserts  that 
M  o  m  if  and  only  if  there  is  a  transformation  T  with  Tm  =  M. 
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Theorem  9.  For  any  sequence  of  transformations  T\,  Ti,  .  .  .  ,  there  is  a  trans¬ 
formation  T  such  that  for  any  standard  measure  m,  Fmk(y )  — »  FTm(y )  a/  e^ery  point 
of  continuity  of  FTm,  where  mk  =  Tk  ...  Tim. 

Proof.  Let  ft  be  the  space  of  sequences  u>  =  (x0,  xi  .  .  .),  0  ^  x*  ^  1.  For 
any  a,  0  ^  a  ^  1,  there  is  a  probability  measure  P0,  defined  for  Borel  sets  of  ft, 
such  that  Pa{x 0  =  a)  =  1  and  Pa{(xk  £  E\x0,  .  .  .  ,  xk-i) }  =  Dk(xk- 1,  £),  where 
is  the  set  of  measures  defining  Tk.  Then  E(xk+i\xo,  .  .  .  ,  xk)  =  x*,  so  that, 
by  induction  on  7,  £(xfcfy|x0,  .  .  .  ,  xk)  =  E[E(xk+j  |xo,  .  .  .  ,  Xfc+>_i)|x0,  .  .  .  ,  x*]  = 
E(xk+j—\ | Xo,  .  .  .  ,  Xk)  =  Xk  for  ally  ^  1.  Thus,  x0,  Xi,  ...  is  a  martingale;  since 
0  ^  x*  ^  1,  a  theorem  of  Doob  [3]  asserts  that  there  is  a  chance  variable  x 
such  that  Xfc  — >  x  with  probability  1,  and  that  £(x|x0,  .  .  .  ,  x*)  =  xk.  Iq  particu¬ 
lar  E(x)  =  jE(x0)  =  a.  Let  D(a,  E)  =  Pa{x  6  E\.  We  shall  show  that  the  set 
of  measures  D(a,  E),  0  ^  a  ^  1,  is  the  required  transformation  T. 

For  any  Borel  function  g(x0,  .  .  .  ,  xk)  (10)  J gdPa  =  J f .  .  .  J g{x 0,  .  .  .  ,  xk) 
dDk(xk- 1,  x„)  .  .  .  dDi(x 0,  xMlaixo),  where  Ia  is  the  measure  concentrated  at  a, 
so  that  / g dPa  is  a  Borel  function  of  a.  The  class  g  of  sets  5  for  which  Pa(S)  is 

a  Borel  function  of  a  is  a  normal  class  [7,  p.  83]  which  includes  all  (xo,  .  .  .  ,  Xfc)- 
Borel  sets,  so  that«*>  [5,p.  83]  includes  all  Borel  sets  of  ft.  In  particular,  Pa{x£P]  = 
D{a,  E)  is  a  Borel  function  of  a,  so  that  D(a,  E)  is  a  transformation  T.  For  any 

standard  measure  m,  define,  for  all  Borel  subsets  5  of  ft,  Pm(S )  =  J P a{S)dm(a) . 

Then  for  every  g{u),  J gdPm  =  f\f gdPa^dm(a).  Letting  g  be  the  characteristic 

function  of  an  xfc-set  and  using  (10)  shows  that  the  distribution  of  xk  is  mk.  Also 
the  distribution  of  x  is  Tm,  and  xk—>x  with  Pm-probability  1,  so  that  Fmk{y)  — > 
FTm(y )  at  all  points  of  continuity  of  FTm. 

Theorem  10.  For  N  =  2,  if  M  otm,  then  M  >  m. 

Proof.  We  shall  construct  a  sequence  of  transformations  7\,  7*2,  •  •  •  such  that 
cmi(y)  — >  cM(y)  for  all  y,  where  mk  —  Tk  ...  Tim.  Then  cM(y)  =  cTm(y)  for  all  y, 
where  T  is  the  transformation  whose  existence  is  asserted  in  theorem  9,  so  that 
M  =  Tm.  For  any  subinterval  (a,  b )  of  (0,  1),  let  T(a,  b)  be  the  transformation 
defined  by 

D(x,E)  E  +  ~-^  h  iova^x^b, 

0  O  00 

D(x,  E)  =  Ix  for  x  outside  (a,  b)  . 

It  is  easily  verified  that  for  any  measure  m,  cT(a,b)m  =  cm  for  x  outside  ( a ,  b), 

=  c„(a)  +£— ?  cm(b)  for  a£x£b. 

Since  Mom,  cM(x)  ^  cm(x)  for  all  x.  At  any  point  [h,  cm(h)]  of  the 
curve  y  =  cM(x),  draw  a  tangent,  intersecting  y  =  cm(x)  say  at  x  =  a\,  x  =  b, 
where  ax  S  h  S  bv  Then,  with  Tx  =  T(ax,  bf),  cTim  ^  cM  with  equality  at  x  =  h. 
Applying  the  same  process  to  y  =  cTim  from  a  point  [/2,  cM(t 2)]  and  continuing 
in  this  way,  using  a  sequence  fi,  h,  .  .  .  dense  in  (0,  1),  yields  a  sequence  Pi,  P2,  .  .  . 
such  that  Cmt(y)  — *  Cjv(y)  for  all  y. 
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Theorems  6  and  10  combine  to  yield  the  following  partial  converse  of  the  re¬ 
sult  of  Hodges  and  Lehmann  and  Doob  mentioned  in  section  5:  If  M,m  are  stand¬ 


ard  measures  in  (0, 1)  such  that  J g(x)dM  ^  J g(x)dmfor  every  continuous  convex  g, 


then  there  are  chance  variables  pi,  p2  with  distributions  M,  m  such  that  E(pi  |  p2)  =  p2. 
The  requirement  that  M,  m  be  standard  measures  on  (0,  1)  can  be  immediately 
weakened  so  that  M ,  m  can  be  any  probability  measures  over  a  bounded  interval 
(a,  b).  The  extension  to  probability  measures  over  (—  °° ,  <» )  has  not  been  carried 
out,  and  the  extension  to  TV-dimensional  vector  variables  which,  in  view  of  the¬ 
orem  6,  would  imply  the  equivalence  of  >  and  d  ,  remains  unsolved.  It  has  been 
pointed  out  by  S.  Sherman  that  theorems  5,  8,  and  10,  for  the  special  case  of  meas¬ 
ures  concentrated  at  a  finite  number  of  points,  are  given,  somewhat  disguised,  in 
[5,  theorem  45  and  associated  results]. 


7.  Combinations  of  experiments 

For  two  experiments  a,  0,  the  combination  (a,  0)  is  the  experiment  defined  by 
the  space  X  X  Y  with  the  TV  probability  measures  «i  X  v\,  .  .  .  ,  uN  X  vN,  where 
a  —  (X,  «i,  ...  ,  uN),  0  =  (Y,v i,  .  .  .  ,  vN ). 

Theorem  11.  If  a*,  0*  are  the  standard  experiments  for  a,  0,  then  the  standard 
experiment  for  (a*,  0*)  is  the  same  as  that  for  (a,  0). 

Proof.  If  Npi(x),  Nqi(y)  are  the  densities  of  Vi  with  respect  to  u0,  v0,  then 
di  (*,  y )  =  N Pi  (z)  q{  (y)  /'^pi  (*)  q,  (y)  is  the  density  of  Ui  X  Vi  with  re- 

i 

spect  to  w0  —  N~l^2ui  X  Vi.  The  measure  m  for  the  standard  experiment  for 

i 

(a,  0)  is  the  joint  distribution  of  dh  .  .  .  ,  dN  with  respect  to  w0.  The  function 
Di(p,  q)  =  Npiqi/  ^2  Piqi  is  the  density  for  the  measure  w,  X  Mi  on  P  X  Q  with 

i 

respect  to  the  measure  y0  —  TV-1^^  XAf, ,  where  a*  =  (Pu  mu  .  .  .  ,  mN) , 

i 

0*  =  (Q}  Mi,  .  .  .  ,  Mn),  and  the  measure  M  for  the  standard  experiment  for 
(a*,  0*)  is  the  joint  distribution  of  Dx,  .  .  .  ,  DN  with  respect  to  70.  Now  for  each  i, 
p  has  the  same  distribution  with  respect  to  w,  as  p(x)  with  respect  to  and 
similarly  for  q,  Mi,  q(y),  vi}  so  that  (p,  q)  with  respect  to  w,  X  Mi  has  the  same 
distribution  as  [p(x),  q(y)\,  with  respect  to  Ui  X  Since  ZT,  is  the  same  func¬ 
tion  of  p,  q  that  di  is  of  p(x),  q(y),  the  joint  distribution  of  di,  ...  ,  dN  with  re¬ 
spect  to  wo  is  the  same  as  that  of  Di,  .  .  .  ,  DN  with  respect  to  yo- 
Theorem  12.  If  ai  >  a2  and  0i  >  02  then  (01,  0i)  >  (a2,  02). 

Proof.  Since  >  is  transitive  (this  follows  from  theorem  6),  we  may  suppose 
that  ai  =  a2  =  a;  the  general  result  would  follow  from  this  case,  since  (m,  0\ )  > 
(01,  02)  >  (0i,  02).  Let  a,  0i,  02  have  standard  measures  m,  m',  m"  and  let  X  = 
Pi  X  Pi  X  Pz  X  we  define  a  measure  wt-  on  X  by  the  following  specifica¬ 
tions:  (pi,  p2)  have  distribution  m,  X  m(,  and  the  conditional  distribution  of 
(pz,  pf)  for  fixed  pi,  p2  is  given  by  Pr{pz  £  S,  pi  €  T\pi,  p2]  =  g(pi)Q(pi,  T), 
where  g  is  the  characteristic  function  of  S  and  Q  is  the  function  whose  existence 
is  implied  by  0i  >  02,  so  that  m'/(T)  =  J Q(p,  T)dm'i.  Then  (p3,  pi)  have  dis- 
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tribution  m,-  X  m"  with  respect  to  Wi.  The  standard  experiments  for  (a,  ft), 
(a,  (82)  have  measures  (Mi,  .  .  .  ,  Mn),  (M\ ,  .  .  .  ,  M%),  where  Af„  M*  are  the 
distributions  of  d  =  (d\,  .  .  .  ,  dN)  and  D  =  (D\,  .  .  .  ,  DN)  with  respect  to  wi} 


where  dt  =  pu,  pu/  ^  Pu>  and  =  P**>  ^4*/  2  ^3*‘>  ^4< »  and  ll  is  suffi" 


cient  to  show  that  the  conditional  distribution  of  D  given  d  is  independent  of  i. 
For  any  function  f(D),  in  fact  for  any  function  of  (p3,  Pi),  E(f\pi,  p3)  =  h(pi,  p2) 
is  independent  of  i,  so  that  we  need  show  only  that  E(h  |  d)  using  measure  m,  X  m'i 
on  Pi  X  Pi  is  independent  of  i.  Since  the  density  of  m,  X  vn\  with  respect  to 


—  nti  X  m'.  is  di}  a  function  of  d,  we  conclude  by  Neyman  factorization  [4], 


that  d  is  a  sufficient  statistic  for  the  N  measures  X  m',,  so  that  E(h\d)  is  inde¬ 
pendent  of  i. 

The  extension  of  the  concept  of  combination  of  two  independent  experiments 
and  of  theorem  12  to  the  case  of  combination  of  n  independent  experiments  is 
straightforward,  and  we  obtain  that  if  ai  >•  i  =  1 ,  ...  ,n  then  (aj,  .  .  .  ,  a„)  > 
(j8i,  .  .  .  ,  |8n).  In  particular  if  a  >  j3,  then  the  experiment  yielding  n  independent 
a’s  is  sufficient  for  the  experiment  yielding  n  independent  /3’s.  It  would  be  inter¬ 
esting  to  know  whether  conversely  (a,  a)  >  (|8,  0)  implies  a  >  /3. 


8.  Binomial  experiments 

If  the  space  X  consists  of  two  points,  say  0,  1,  an  experiment  a  is  simply  the 
specification  of  a  vector  a  =  (a\,  ...  ,  aN),  0  ^  at  ^  1,  where  al  =  m,{x  =  1}. 
For  the  case  N  =  2,  a  simple  computation  shows  that  the  standard  measure  M  for 
(di,  a2)  assigns  measures  d,  1  —  d  to  the  points  {pi,  1  —  pi),  (pi,  1  —  pi),  where 
d  =  (ax  +  o2)/ 2,  pi  -  ail  2d,  pi  -  ( 1  —  di)/2(l  —  d).  Thus  if  ax  ^  a2,  we  have 
I  =  0  for  0  ^  x  S  pi 

c m  (%)  j  =  d(x-  pi)  for  pi  ^  p2 

{  =  d(pi-  pi)  +  (x  -  pi)  for  pi  S  x  ^  1  ; 

if  a2  ^  ai,  we  interchange  ax,  a2  and  replace  d  by  1  —  d  in  the  above  description. 
For  two  binomial  experiments  (di,  a!)  =  a,  (bi,  bi)  =  b  with  standard  meas¬ 
ures  M,  m,  the  relation  between  cM  and  cm  is  geometrically  clear: 

a  >  b  if  and  only  if 

min  [pi(a),  pi(a)\  S  min  [pi(b),  pi(b)]  and  max  [pi(a),  pi(a))  ^  max  \pi(b),  pi(b)}. 

As  an  application  of  the  comparison  of  binomial  experiments,  we  consider  the 
following  2X2  table  problem.  There  are  two  characteristics  H,  S,  whose  propor¬ 
tions  h,  s,  in  the  general  population  are  known.  Moreover  it  is  known  that  the  pro¬ 
portion  of  HS  in  the  general  population  is  either  hs  or  a  definite  alternative  c.  A 
sample  of  size  k  is  to  be  selected,  after  which  some  action  is  to  be  taken,  whose 
worth  depends  only  on  whether  Pr{HS}  =  hs  or  Pr{HS}  =  c.  Suppose  that,  for 
each  observation,  the  statistician  may  select  an  individual  at  random  from  H  or  S 
or  non-Z7  or  non-S;  he  has  a  choice  among  four  binomial  experiments  which  we 
denote  by  aH,  as,  aCH,  acs ■  If  it  should  happen  that  one  of  these,  say  aH,  is  more 
informative  than  each  of  the  other  three,  then  it  follows  from  the  extension  of 
theorem  12  that  a  sample  of  k  individuals  from  H  is  more  informative  than  any 


102 


SECOND  BERKELEY  SYMPOSIUM :  BLACKWELL 


other  combination  of  k  experiments  from  aH,  as,  acH,  a-cs  (a  sample  of  k  individuals 
from  H  can  then  also  be  shown  to  be  more  informative  than  any  other  sequentially 
selected  set  of  k  experiments  from  aH,  as,  aCn,  acs,  where  the  decision  about  which 
of  the  four  experiments  to  do  next  depends  on  the  results  already  obtained,  but 
we  shall  not  go  into  this) . 

The  four  experiments  are  aH  =  ( s,c/h ),  as  =  ( h ,  c/s),  a ch  =  [s,  (s  —  c)/(  1  —  h )], 
and  acs  =  [h,  ( h  —  c)/{  1  —  s)].  Computation  of  pi,  p 2  for  each  of  the  four  ex¬ 
periments  and  using  the  condition  given  above  for  a  >  b  yields  the  following 
conditions: 

For  H  >  S  :  h  ^  s 

H  >  CH  :  h  ^  s  ,  h  s  tk  1 
H  >  CS  :h  +  s  1 
S  >  CS  :  s  ^  h  ,  s  +  h  s  1 
S  >  CH  :  h  +  s  ^  1 
CS  >  CH  :  h  ^  s 

Without  loss  of  generality,  we  may  suppose  that  h  is  the  smallest  of  the  four  num¬ 
bers  h,  s,  1  —  h,  1  —  s.  Then  aH  >  as  >  acH,  a.H  >  a-cs  >  o.Ch  and  as,  acs  are 
not  comparable  unless  h  =  s  or  h  —  1  —  s.  Thus  the  procedure  which  always  se¬ 
lects  the  characteristic  which  is  rarest  in  the  general  population  is  more  informative 
than  any  other  procedure  of  the  class  considered.  The  experiment  acH  is  the  least 
informative  of  the  four,  while  as,  aCs  are  intermediate. 

A  second  example,  which  suggests  that  for  N  >  2,  the  concept  d  is  quite 
strong  (and  >  is  at  least  as  strong  as  3 ),  is  the  binomial  experiment  (0,  \,  1)  =  a. 
The  standard  measure  M  for  a  assigns  measure  \  to  each  of  Qi  =  (0,  f)  and 
Qi  =  (f,  0).  Theorem  4  shows  that  the  measures  m  c  M  are  exactly  those 

concentrated  on  the  line  segment  joining  Qi,  Q2',  the  binomial  experiments 
|3  =  (ax,  a2,  a3)  whose  m  is  concentrated  on  this  line  are  those  for  which  a2  = 
(ax  -f-  a3)/2.  Thus  a  is  not  more  informative  than  (0,  |)  or  than  (|  —  e,  h  \  +  2e), 

e  >  0  for  instance,  and  for  any  j3  c  a,  a  suitable  arbitrarily  small  perturbation 
of  the  a’s  destroys  the  relationship. 
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